New upper bounds on cross-validation for the k-Nearest Neighbor classification rule

نویسندگان

  • Alain Celisse
  • Tristan Mary-Huard
چکیده

The present work addresses binary classification by use of the k-nearest neighbors (kNN) classifier. Among several assets, it belongs to intuitive majority vote classification rules and also adapts to spatial inhomogeneity, which is particularly relevant in high dimensional settings where no a priori partitioning of the space seems realistic. However the performance of the kNN classifier crucially depends on the number k of neighbors that will be considered. To calibrate the parameter k, cross-validation procedures such as V -fold or leave-one-out are usually used. But on the one hand these procedures can become highly time-consuming. On the other hand, not that much theoretical guaranties do exist on the performance of such procedures. Recently [11] have derived closed-form formulas for the leave-p-out estimator of the kNN classifier performance. Such formulas now allow to efficiently perform cross-validation. The main purpose of the present article is twofold: First, we provide a new strategy to derive bounds on moments of the leave-p-out estimator used to assess the performance of the kNN classifier. This new strategy exploits the link between leave-p-out and U -statistics as well as the generalized Efron-Stein inequality. Second, these moment upper bounds are used to settle a new exponential concentration inequality for

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Locally Determining the Number of Neighbors in the k-Nearest Neighbor Rule Based on Statistical Confidence

The k-nearest neighbor rule is one of the most attractive pattern classification algorithms. In practice, the value of k is usually determined by the cross-validation method. In this work, we propose a new method that locally determines the number of nearest neighbors based on the concept of statistical confidence. We define the confidence associated with decisions that are made by the majority...

متن کامل

Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence

The k-nearest-neighbor rule is one of the most attractive pattern classification algorithms. In practice, the choice of k is determined by the cross-validation method. In this work, we propose a new method for neighborhood size selection that is based on the concept of statistical confidence. We define the confidence associated with a decision that is made by the majority rule from a finite num...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

Identification of selected monogeneans using image processing, artificial neural network and K-nearest neighbor

Abstract Over the last two decades, improvements in developing computational tools made significant contributions to the classification of biological specimens` images to their correspondence species. These days, identification of biological species is much easier for taxonomist and even non-taxonomists due to the development of automated computer techniques and systems.  In this study, we d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015